Lempel-Ziv Factorization May Be Harder Than Computing All Runs
نویسنده
چکیده
The complexity of computing the Lempel-Ziv factorization and the set of all runs (= maximal repetitions) is studied in the decision tree model of computation over ordered alphabet. It is known that both these problems can be solved by RAM algorithms in O(n log σ) time, where n is the length of the input string and σ is the number of distinct letters in it. We prove an Ω(n log σ) lower bound on the number of comparisons required to construct the Lempel-Ziv factorization and thereby conclude that a popular technique of computation of runs using the Lempel-Ziv factorization cannot achieve an o(n log σ) time bound. In contrast with this, we exhibit an O(n) decision tree algorithm finding all runs in a string. Therefore, in the decision tree model the runs problem is easier than the Lempel-Ziv factorization. Thus we support the conjecture that there is a linear RAM algorithm finding all runs. 1998 ACM Subject Classification F.2.2 Pattern Matching
منابع مشابه
Crochemore's Repetitions Algorithm Revisited - Computing Runs
Crochemore’s repetitions algorithm introduced in 1981 was the first O(n logn) algorithm for computing repetitions. Since then, several linear-time worst-case algorithms for computing runs have been introduced. They all follow a similar strategy: first compute the suffix tree or array, then use the suffix tree or array to compute the Lempel-Ziv factorization, then using the Lempel-Ziv factorizat...
متن کاملLempel-Ziv Factorization Using Less Time & Space
For 30 years the Lempel-Ziv factorization LZx of a string x = x[1..n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing all the repetitions (runs) in x. Traditionally the standard method for computing LZx was based on Θ(n)-time (or, depending on the measure used, O(n log n)-time) processing of the suffix tree STx of x. Recen...
متن کاملComputing Longest Previous Factor in linear time and applications
We give two optimal linear-time algorithms for computing the Longest Previous Factor (LPF) array corresponding to a string w. For any position i in w, LPF[i] gives the length of the longest factor of w starting at position i that occurs previously in w. Several properties and applications of LPF are investigated. They include computing the Lempel-Ziv factorization of a string and detecting all ...
متن کاملFaster Compact On-Line Lempel-Ziv Factorization
We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N logN) time and uses only O(N log σ) bits of working space, where N is the length of the string and σ is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N log3 N)...
متن کاملLempel-Ziv factorization: Simple, fast, practical
For decades the Lempel-Ziv (LZ77) factorization has been a cornerstone of data compression and string processing algorithms, and uses for it are still being uncovered. For example, LZ77 is central to several recent text indexing data structures designed to search highly repetitive collections. However, in many applications computation of the factorization remains a bottleneck in practice. In th...
متن کامل